A linguistic and prosodic database for data-driven Japanese TTS synthesis

نویسندگان

  • Atsuhiro Sakurai
  • Takashi Natsume
  • Keikichi Hirose
چکیده

We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Template-driven generation of prosodic information for Chinese concatenative synthesis

In this paper, a template-driven generation of prosodic information is proposed for Chinese text-to-speech conversion. A set of monosyllable-based synthesis units is selected from a large continuous speech database. The speech database is employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word...

متن کامل

طراحی و ارزیابی یک مدل بازسازی گفتار به روش هم‌گذاری واحدهای حساس به بافت نوایی

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...

متن کامل

Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification

Our reasearch goal is to construct a Japanese TTS (Text-to-Speech) system that can output various kinds of prosody. Since such synthetic speech is useful for a practical use, many TTS systems have implemented global prosodic control processing. But fundamentally they're designed to output speech with standard pitch and speech rate. We discuss synthesis method for high quality speech with extrem...

متن کامل

Statistical Methods in Data-driven Modeling of Spanish Prosody for Text to Speech

1 In [1], we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical metho...

متن کامل

Statistical methods in data-driven modeling of Spanish prosody for text to speech

1 In [1], we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998